Function Unit Clustering in Wide-Issue Superscalar Processors
نویسنده
چکیده
As more function units are integrated into wideissue superscalar processors and as cycle times decrease, result-forwarding delays will become worse relative to processor cycle time. Physical distance and capacitive effects of smaller geometry wires are the main reason for this increase in delay. Thus, a full bypass network, able to forward results from any function unit to any other function unit, cannot be realized without increasing the cycle time. Since increasing the cycle time is undesirable, this paper examines clustering of function units as an approximation to an ideal design where there is no inter-function unit communication delay. A cluster of function units is simply a grouping of neighboring function units with fast intra-cluster communication delay. Communication between clusters is assessed a higher penalty because of the distance between the clusters. This complicates function unit selection over the ideal case where there is no delay between function units. The goal of this work is to examine the performance of a clustered microengine compared with an ideal one with the same function unit resources but no communication delay between function units. The results presented in this paper show that clustered configurations of function units can perform competitively with an ideal non-clustered configuration. It is also shown that the steering heuristic makes a significant difference in performance in some cases, particularly when the inter-cluster bypass delay is high.
منابع مشابه
Block - Level Prediction for Wide - Issue Superscalar Processors
Changes in control ow, caused primarily by conditional branches, are a prime impediment to the performance of wide-issue superscalar processors. This paper investigates a block-level prediction scheme to mitigate the e ects of control ow changes caused by conditional branches. Instead of predicting the outcome of each conditional branch individually, this scheme predicts the target of a sequent...
متن کاملEvaluating a Multithreaded Superscalar Microprocessor versus a Multiprocessor Chip
This paper examines implementation techniques for future generations of microprocessors. While the wide superscalar approach, which issues 8 and more instructions per cycle from a single thread, fails to yield a satisfying performance, its combination with techniques that utilize more coarse-grained parallelism is very promising. These techniques are multithreading and multiprocessing. Multi-th...
متن کاملA Direct - Execution Frameworkfor Fast and Accurate Simulation of Superscalar Processors
Multiprocessor system evaluation has traditionally been based on direct-execution based Execution-Driven Simulations (EDS). In such environments, the processor component of the system is not fully modeled. With wide-issue superscalar processors being the norm in today's multiprocessor nodes, there is an urgent need for mod-eling the processor accurately. However, using direct-execution to model...
متن کاملMultithreaded Processors
The instruction-level parallelism found in a conventional instruction stream is limited. Studies have shown the limits of processor utilization even for today's superscalar microprocessors. One solution is the additional utilization of more coarse-grained parallelism. The main approaches are the (single) chip multiprocessor and the multithreaded processor which optimize the throughput of multip...
متن کاملSuperscalar instruction issue
learly, instruction issue and execution are closely related: The more parallel the instruction execution, the higher the requirements for the parallelism of instruction issue. Thus, we see the continuous and harmonized increase of parallelism in instruction issue and execution. This article focuses on superscalar instruction issue, tracing the way parallel instruction execution and issue have i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007